Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            High-quality benchmarks are essential for evaluating reasoning and retrieval capabilities of large language models (LLMs). However, curating datasets for this purpose is not a permanent solution as they are prone to data leakage and inflated performance results. To address these challenges, we propose PhantomWiki: a pipeline to generate unique and factually consistent document corpora with diverse question-answer pairs. Unlike prior work, PhantomWiki is neither a fixed dataset, nor is it based on any existing data. Instead, a new PhantomWiki instance is generated on demand for each evaluation. We vary the question difficulty and corpus size to disentangle reasoning and retrieval capabilities respectively, and find that PhantomWiki datasets are surprisingly challenging for frontier LLMs. Thus, we contribute a scalable and data leakage-resistant framework for disentangled evaluation of reasoning, retrieval, and tool-use abilities.more » « lessFree, publicly-accessible full text available July 16, 2026
- 
            Developing prompt-based methods with Large Language Models (LLMs) requires making numerous decisions, which give rise to a combinatorial search problem over hyper-parameters. This exhaustive evaluation can be time-consuming and costly. In this paper, we propose an adaptive approach to explore this space. We are exploiting the fact that often only few samples are needed to identify clearly superior or inferior settings, and that many evaluation tests are highly correlated. We lean on multi-armed bandits to sequentially identify the next (method, validation sample)-pair to evaluate and utilize low-rank matrix factorization to fill in missing evaluations. We carefully assess the efficacy of our approach on several competitive benchmark problems and show that it can identify the top-performing method using only 5-15% of the typical resources—resulting in 85-95% LLM cost savings. Our code is available at https://github.com/kilian-group/banditeval.more » « lessFree, publicly-accessible full text available June 11, 2026
- 
            Abstract Disease is a key driver of community and ecosystem structure, especially when it strikes foundation species. In the widespread marine foundation species eelgrass (Zostera marina), outbreaks of wasting disease have caused large‐scale meadow collapse in the past, and the causative pathogen,Labyrinthula zosterae, is commonly found in meadows globally. Research to date has mainly focused on abiotic environmental drivers of seagrass wasting disease, but there is strong evidence from other systems that biotic interactions such as herbivory can facilitate plant diseases. How biotic interactions influence seagrass wasting disease in the field is unknown but is potentially important for understanding dynamics of this globally valuable and declining habitat. Here, we investigated links between epifaunal grazers and seagrass wasting disease using a latitudinal field study across 32 eelgrass meadows distributed from southeastern Alaska to southern California. From 2019 to 2021, we conducted annual surveys to assess eelgrass shoot density, morphology, epifauna community, and the prevalence and lesion area of wasting disease infections. We integrated field data with satellite measurements of sea surface temperature and used structural equation modeling to test the magnitude and direction of possible drivers of wasting disease. Our results show that grazing by small invertebrates was associated with a 29% increase in prevalence of wasting disease infections and that both the prevalence and lesion area of disease increased with total epifauna abundances. Furthermore, these relationships differed among taxa; disease levels increased with snail (Lacunaspp.) and idoteid isopod abundances but were not related to abundance of ampithoid amphipods. This field study across 23° of latitude suggests a prominent role for invertebrate consumers in facilitating disease outbreaks with potentially large impacts on coastal seagrass ecosystems.more » « lessFree, publicly-accessible full text available January 1, 2026
- 
            Seagrass meadows are essential habitats that support marine biodiversity and coastal communities while sequestering carbon, filtering water, and stabilizing coastal sediments. Warming temperatures stress seagrass meadows and can facilitate seagrass wasting disease, contributing to large-scale diebacks of seagrass meadows. Here, we demonstrate how high-resolution imagery, collected by uncrewed aerial vehicle (UAV) and validated by in situ sampling, can quantify seagrass responses to disease and thermal stress.more » « less
- 
            Eelgrass creates critical coastal habitats worldwide and fulfills essential ecosystem functions as a foundation seagrass. Climate warming and disease threaten eelgrass, causing mass mortalities and cascading ecological impacts. Subtidal meadows are deeper than intertidal and may also provide refuge from the temperature-sensitive seagrass wasting disease. From cross-boundary surveys of 5761 eelgrass leaves from Alaska to Washington and assisted with a machine-language algorithm, we measured outbreak conditions. Across summers 2017 and 2018, disease prevalence was 16% lower for subtidal than intertidal leaves; in both tidal zones, disease risk was lower for plants in cooler conditions. Even in subtidal meadows, which are more environmentally stable and sheltered from temperature and other stressors common for intertidal eelgrass, we observed high disease levels, with half of the sites exceeding 50% prevalence. Models predicted reduced disease prevalence and severity under cooler conditions, confirming a strong interaction between disease and temperature. At both tidal zones, prevalence was lower in more dense eelgrass meadows, suggesting disease is suppressed in healthy, higher density meadows. These results underscore the value of subtidal eelgrass and meadows in cooler locations as refugia, indicate that cooling can suppress disease, and have implications for eelgrass conservation and management under future climate change scenarios. This article is part of the theme issue ‘Infectious disease ecology and evolution in a changing world’.more » « less
- 
            Non-negative matrix factorization (NMF) is a highly celebrated algorithm for matrix decomposition that guarantees non-negative factors. The underlying optimization problem is computationally intractable, yet in practice, gradient-descent-based methods often find good solutions. In this paper, we revisit the NMF optimization problem and analyze its loss landscape in non-worst-case settings. It has recently been observed that gradients in deep networks tend to point towards the final minimizer throughout the optimization procedure. We show that a similar property holds (with high probability) for NMF, provably in a non-worst case model with a planted solution, and empirically across an extensive suite of real-world NMF problems. Our analysis predicts that this property becomes more likely with growing number of parameters, and experiments suggest that a similar trend might also hold for deep neural networks---turning increasing dataset sizes and model sizes into a blessing from an optimization perspective.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                     Full Text Available
                                                Full Text Available